Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to update with GFA #115

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add command to update with GFA #115

wants to merge 7 commits into from

Conversation

dkhofer
Copy link
Collaborator

@dkhofer dkhofer commented Dec 21, 2024

No description provided.

@@ -158,8 +158,8 @@ pub struct Link<T: SampleType, S: Opt, U: Opt> {
#[derive(Debug, Clone, Ord, Eq, PartialOrd, PartialEq)]
pub struct Path<T: SampleType, S: Opt, U: Opt> {
pub name: String,
pub dir: Vec<bool>,
pub nodes: Vec<T>,
pub strands: Vec<bool>,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt like renaming some of these fields to be clearer

@dkhofer dkhofer requested a review from Chris7 January 10, 2025 20:25
let mut path_strands_by_name = HashMap::new();
for path in &gfa.paths {
let path_name = path.name.clone();
path_segments_by_name.insert(path_name.clone(), path.segments.clone());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to clone these or can we pass a reference? seems like it can be a problem w/ bigger GFAs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed to reference

let path_name = &path.name;
if matched_path_names.contains(path_name.as_str()) {
for segment_id in path.segments.iter() {
matched_path_name_by_segment_id.insert(segment_id, path_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't a segment be in multiple paths? Should this be a set of paths the segment belongs to

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, looking below a path segment id can be overwritten when shared and will flag it as an unmatched path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yep, fixed

let mut previous_node_strand = Strand::Forward;
let mut new_path_edges = vec![];
for (i, segment_id) in unmatched_path_segment_ids.iter().enumerate() {
if existing_path_ranges_by_segment_id.contains_key(segment_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you can do something like if let Some(start, end) = existing_path_ranges_by_segment_id.get(segment_id) to avoid the double lookup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done

.sequence_type("DNA")
.sequence(segment_sequence)
.save(conn);
let node_id = Node::create(conn, &sequence.hash, None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should make a node hash so this can be repeated without infinite updates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but let me know if the hash string could be better

);

let temp_dir = tempdir().unwrap();
let gfa_diff_path = temp_dir.path().join("parent-child-diff.gfa");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we save this as a general fixture to have a test where we just have the initial simple.fa and no other intermediate assets the alternative paths could maybe be coming from?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, i gues that is the walk-diff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, did this for the path diff too

let node_id = Node::create(
conn,
&sequence.hash,
calculate_hash(&format!("{segment_id}_{hash}", hash = &sequence.hash)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to prefix with the path id as well, so we don't have node reuse if this segment is in multiple paths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a chicken-and-egg problem because we don't have the path ID yet, we need the nodes + edges between them in order to create the path. I added the path name here instead, that OK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants